Stanford Encyclopedia of Philosophy - Who is being cited?

Introduction

The Stanford Encyclopedia of Philosophy (https://plato.stanford.edu/) is often considered to contain the condensed knowledge of (academic) philosophy. As such, it is suggestive to analyze it in order to learn something about the field of academic philosophy. This is what I aim to do in this notebook.

In particular, I am interested in examining to which degree the different genders are represented within the encyclopedia. To this end, we will analyze a dataset containing all the references that appear in the encyclopedia, that is, all the books and journal articles that are being cited (a description of how I gathered the data can be found in a different notebook). I will address the following questions:

The data set

Let us begin by loading the data set and examining its properties.

The data set contains information on the first author of an article/book, the year in which it was published, and the title of the article/book. It contains 146472 entries (some preprocessing of erroneous references or reference that did not fit the format has been done beforehand, see the data notebook for more details).

Inferring an author's gender

We want to analyze how works by men and women are represented in the encyclopedia. The only features we have at our disposal, however, are the name of the first author, the year of publication and the title of the work. We have no direct information about an author's gender. An author's first name, however, allows to guess their gender with relatively high reliability. This is, of course, a simplification that disregards non-binary authors who do not identify with either gender, and it might get the names of transgender people wrong, as well as some names which are used in untypical ways (Hilary Putnam, for instance, is a male philosopher with a first name that is more commonly given to girls). Nonetheless, under the assumption that these errors occur (roughly) equally often for authors categorized as male or female, the first name will allow us to obtain a relatively accurate estimate of the proportion of female-authored references in the encyclopedia. Yet, it is important to highlight that this does not give us exact numbers, but only an estimate.

We use a package called gender guesser which categorizes first names into the following six categories:

The only blatant mistake among the highly cited first authors seems to be that Hilary Putnam is categorized as female.

Analysis

Let us now explore the gender differences present in the data. We begin by looking at the references' total distribution of first authors' gender.

As we can see, 75.26% of references are works (guessed to be) authored by men. Only 15.31% of references are works (guessed to be) authored by women. The proportion of references assigned to the other four categories is less than 10%. To simplify the analysis, we will hence subsequently focus only on references that can be categorized as authored by either a male or a female author. Moreover, we will restrict our analysis to works published in the year 1950 and after.

For each year individually, we will now count the citations by gender.

Next, we will plot how the number of citations by gender changes with the publication year of the referenced book or journal article. We can see that the majority of cited works are, roughly, from between 1995 and 2015. There are close to no publications by female authors from before 1970. After 1970, ever more publications by female authors are being cited.

Let us now plot the proportion of female citations throughout the years.

The blue line represents the actual percentage of female-authored works from the given publication year. The black line represents the 5-year moving average. As we can see, up to the publication year 1970 the proportion of female authored references is somewhere around 5%. After that, the 5-year moving average increases (with a few exceptions) from year to year, and in 2021 it reaches over 28%.

In the medium-term future, we would ideally like to arrive at a state of the discipline where roughly half of the senior staff in philosophy, and hence the authors of influential philosophy articles, are women. As this analysis shows, we are still far from this. Nonetheless, this analysis also comes to the optimistic conclusion that female philosophers are gaining more influence on the field, and that the change we witness is a change in a positive direction.

Who are the most cited authors?

A manual analysis of the 170 entries reveals that all "unknown", "mostly_male", and "mostly_female" entries seem to actually be male. Consequently, we change their gender guess value.

Now, let us look at the ten most cited men and women. To this end, we will create a dataframe which contains the names of the ten most cited men and women and their gender-specific ranking.

We can see that David Lewis is by far the most cited author on the SEP, followed by John Rawls and Bertrand Russell. Among female philosophers, Martha Nussbaum is the most cited, followed by Christine Korsgaard and Julia Annas. It is striking that the ten most cited male philosophers each have significantly more citations than their female counterpart on the same rank. The female philosopher with the most citations, Martha Nussbaum, has less citations than the ninth most cited male philosopher. This further confirms that there is a strong imbalance between the representation of genders in the SEP.

Affiliations and geographical representation

Let us now look at which universities researchers who are cited in the SEP are affiliated with, and how citations are located geographically. To this end, we extract data on researcher affiliations from PhilPeople (http://www.philpeople.org). Unfortunately, only for roughly half the citations was it possible to obtain affiliation data (72140 of 147599, or ~49%). The missing affiliation data might be due to a number of reasons. Historical figures are typically not listed on PhilPeople, as are researchers from fields other than philosophy. Under this assumption, we are not analyzing where citations on the SEP come from in general, but where contemporary philosophy citations come from. This, however, is an interesting question in its own right. All in all, we find that citations come from researchers from 1070 different universities.

We can see that works written by researchers affiliated with NYU account for 1.3% of all citations on the SEP. That is, more than every 100th citations is from an NYU researcher. Among the 10 universities which account for the most citations, 9 are in the US. Only one of the ten is from outside the US, namely Oxford in the UK.

It is, of course, not surprising that an English language encyclopedia cites mostly works published in English. However, it would be interesting to see to which extent the English speaking countries dominate the influence in the SEP, and where non-English citations come from.

In order to examine where the citations come from geographically, we need geocoding data. This, we obtain using the Google Maps Geocode API.

Here we can see the ten most-cited institutes from non-English speaking countries. The clear leader is the university of Amsterda, followed by Ludwig-Maximilians Universität München, and Stockholm University.

Let us now generally look at how the citations are distributed by the countries in which authors' affiliations are located.

We can see that over 64% of those citations for which we could identify the affiliation come from researchers who are affiliated with a US institution. The US hence account for the vast majority of citations. Next in line is the UK with 12.5% of citations, followed by Canada with ~5%, Australia with ~4%, and Germany with ~2%. It is unsurprising that English speaking countries are overpresented in an English encyclopedia, but the extent to which this is the case might still be surprising: together, they account for over 85% of (identifiable) citations.

Lastly, we can inspect the map on which the institutes where citations are coming from are represented by contribution size and color. There are dense clusters in North America and Europe, but also substantial contributions from some areas in South America, some countries in Asia, and Australia. Very few works from authors who work at institutes on the African continent are cited (with the exception of South Africa).